-
Notifications
You must be signed in to change notification settings - Fork 7
WASM support #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
WASM support #309
Conversation
It adds ~400 MB to the Nix closure which is a lot.
Also, require WASM plugins to export a `nix_wasm_init_v1` function.
We don't want to reuse instances between WASM calls, since that's impure: the WASM function could use some mutable global state to return a different value every time. So instead, pre-instantiate instances, allowing them to be instantiated efficiently. Unfortunately, while this is much faster than not using pre-instantiation, it is slower than reusing an instance: about 1.3s compared to 0.7s for the call.nix test in nix-wasm-plugin-test (which performs 100K WASM calls).
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis pull request integrates Wasmtime and WebAssembly/WASI support into Nix. It adds Wasmtime as a packaged dependency, includes a WASM primop for executing WebAssembly modules from Nix expressions, implements a WASI derivation builder for running WASI workloads, and updates build configurations and platform checks accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant User as Nix User<br/>(prim_wasm call)
participant EvalState
participant WasmCache as Instance<br/>Cache
participant Wasmtime as Wasmtime<br/>Engine/Linker
participant Module as WASM<br/>Module
participant Instance as WASM<br/>Instance
participant Memory as WASM<br/>Memory
participant HostFns as Host<br/>Functions
User->>EvalState: prim_wasm(path, entry, arg)
EvalState->>WasmCache: lookup instance by path
alt Instance exists
WasmCache-->>EvalState: return cached NixWasmInstancePre
else First time
EvalState->>Wasmtime: create Engine & Linker
EvalState->>HostFns: register regFuns
EvalState->>Module: load module from path
Wasmtime->>Module: instantiate with Linker
Module->>Instance: create instance
Instance->>Memory: access WASM memory
WasmCache->>WasmCache: cache by path
WasmCache-->>EvalState: return new NixWasmInstancePre
end
EvalState->>Instance: locate exported function (entry)
Instance-->>EvalState: function ref
EvalState->>EvalState: prepare arg as Value
EvalState->>Memory: copy arg into WASM memory
EvalState->>Instance: call function with memory id
Instance->>HostFns: invoke host functions as needed
HostFns->>Memory: read/write values
Instance-->>EvalState: return result id
EvalState->>Memory: read result from WASM memory
EvalState-->>User: return as Nix Value
sequenceDiagram
participant Store as Nix<br/>Store
participant Builder as Build<br/>Dispatcher
participant WasiBuilder as WASI<br/>Derivation<br/>Builder
participant Wasmtime as Wasmtime<br/>Engine
participant WasmModule as WASM<br/>Module
participant WasiEnv as WASI<br/>Environment
participant Instance as WASM<br/>Instance
Store->>Builder: makeDerivationBuilder(drv: wasm32-wasip1)
Builder->>Builder: check platform == wasm32-wasip1
alt WASI Platform
Builder->>WasiBuilder: instantiate for WASI build
WasiBuilder->>Wasmtime: new Engine & Linker
WasiBuilder->>WasiEnv: configure WASI<br/>(stdin, stdout, stderr,<br/>argv, env, preopened dirs)
WasiBuilder->>WasmModule: load from drv.builder path
Wasmtime->>Instance: instantiate module
WasiBuilder->>Instance: locate _start export
Instance->>Instance: verify is function
WasiBuilder->>Instance: invoke _start() with no args
Instance->>WasiEnv: interact with WASI calls
Instance-->>WasiBuilder: execution complete
WasiBuilder->>WasiBuilder: exit with code 0
WasiBuilder-->>Store: build complete
else Non-WASI Platform
Builder->>Builder: use platform-specific builder
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Fix all issues with AI Agents 🤖
In @src/libexpr/primops/wasm.cc:
- Around line 255-267: The code reads ValueId uint32_t values directly from WASM
memory in make_list, copy_list, make_attrset, and copy_attrset, assuming host
endianness; fix this by converting each 32-bit word from little-endian to host
order when constructing ValueId (e.g., apply le32toh or an equivalent portable
little-endian-to-host conversion) for every read from the memory subspan (the
vs[] accesses) so ValueId values are correct on big-endian hosts; update all
occurrences in the listed functions to perform this conversion before indexing
into values or assigning into lists/attrsets.
- Around line 334-350: In copy_attrname, replace the compile-time asserts with
runtime validations: check that attrIdx < attrs.size() and that len ==
name.size(); if either check fails, raise a runtime error (e.g., throw
std::runtime_error with a clear message mentioning valueId, attrIdx, len and
name.size()) or otherwise signal a trap/error path instead of proceeding; only
perform memcpy when both checks pass and keep the rest of the logic
(state.forceAttrs, obtaining name from state.symbols, and writing into memory())
unchanged.
- Around line 433-439: The static variable name `primop_fromTOML` is misleading
for the WASM primop registration; rename the variable to a descriptive
identifier like `primop_wasm` (i.e., change `static RegisterPrimOp
primop_fromTOML(...` to `static RegisterPrimOp primop_wasm(...`) so it matches
the registered `.name = "wasm"` and `.fun = prim_wasm`; update any local
references to this static variable if present in the file to use the new name.
- Around line 309-332: The copy_attrset function writes attributes into WASM
memory using enumerate(*value.attrs()) which is non-deterministic; to fix,
gather the attributes into a temporary vector (e.g., vector of pairs or pointers
to attr), sort that vector lexicographically by the attribute name string via
state.symbols[attr.name], then iterate the sorted vector to populate
buf[n].value = addValue(attr.value) and buf[n].nameLen =
state.symbols[attr.name].size(), preserving the existing buffer handling
(subspan/Attr/maxLen) and return value.attrs()->size().
- Around line 411-418: The static unordered_map instancesPre is not thread-safe
and leaks entries; replace it with a thread-safe concurrent map (use
boost::concurrent_flat_map keyed by SourcePath) and store weak references
instead of owning refs so entries can be reclaimed (e.g., map value -> weak
pointer to NixWasmInstancePre or equivalent weak_ref). Update the lookup/insert
logic around instancesPre and the use of make_ref<NixWasmInstancePre>(wasmPath)
so you first attempt to lock the weak entry, create a new instance only if
expired, and store a weak reference in the concurrent_flat_map; optionally add
an eviction or LRU policy if unbounded growth remains a concern.
- Around line 49-53: The subspan function currently uses assert to check buffer
size which is omitted in release builds; replace the assert in
subspan(std::span<uint8_t> s, size_t len) with a runtime check that throws a
descriptive exception (e.g., std::out_of_range or std::length_error) when
s.size() < len * sizeof(T), include the required header ( <stdexcept> ), and
keep the returned span construction (use reinterpret_cast<T*>(s.data()) for
clarity) so callers get a safe failure instead of potential buffer overruns.
🧹 Nitpick comments (12)
src/libstore/derivation-options.cc (1)
399-400: Consider requiring a system feature for WASM support.The special-casing of "wasm32-wasip1" allows any system to build WASM derivations, which aligns with the PR's goal of system-independent derivations. However, this assumes Wasmtime is always available.
Consider whether wasm32-wasip1 builds should require a system feature (e.g., "wasm" or "wasmtime") to ensure the runtime is actually available on the build machine. This would be checked by the existing
getRequiredSystemFeatureslogic at lines 406-408.🔎 Alternative approach using system features
If you want to gate WASM builds on explicit system capability:
- Have wasm32-wasip1 derivations declare
requiredSystemFeatures = ["wasm"];- Remove the special case here, letting the normal system feature check at lines 406-408 handle it:
- if (drv.platform != settings.thisSystem.get() && drv.platform != "wasm32-wasip1" - && !settings.extraPlatforms.get().count(drv.platform) && !drv.isBuiltin()) + if (drv.platform != settings.thisSystem.get() + && !settings.extraPlatforms.get().count(drv.platform) && !drv.isBuiltin()) return false;This would make WASM support more explicit and consistent with how other cross-compilation scenarios are handled.
src/libstore/meson.build (1)
383-383: Consider using Meson's dependency system instead of hardcoded linker flags.Directly appending
-lwasmtimetolink_argsbypasses Meson's dependency resolution, which can lead to portability and configuration issues:
- No verification that the wasmtime library is actually available
- Library search paths may not be configured correctly
- Platform-specific library naming conventions (e.g.,
.dllvs.so) are not handled- Build will fail with cryptic linker errors rather than clear dependency errors
🔎 Recommended approach using Meson dependency system
Near the top of the file with other dependencies (around line 109-161), add:
+wasmtime = dependency('wasmtime', required : true) +deps_private += wasmtimeThen remove the hardcoded linker flag:
- link_args : linker_export_flags + [ '-lwasmtime' ], + link_args : linker_export_flags,This provides better error messages, respects pkg-config paths, and handles platform differences automatically.
src/libexpr/meson.build (1)
242-242: Consider using Meson's dependency system instead of hardcoded linker flags.This has the same portability concerns as the identical change in
src/libstore/meson.buildat line 383. Hardcoded-lwasmtimebypasses Meson's dependency resolution and can cause issues with library paths, platform-specific naming, and error reporting.🔎 Recommended approach using Meson dependency system
Near the dependencies section (around lines 41-92), add:
+wasmtime = dependency('wasmtime', required : true) +deps_private += wasmtimeThen remove the hardcoded linker flag:
- link_args : linker_export_flags + [ '-lwasmtime' ], + link_args : linker_export_flags,This approach provides better error messages, respects pkg-config configurations, and handles cross-platform differences automatically.
src/libstore/unix/build/derivation-builder.cc (1)
2033-2035: Early return forwasm32-wasip1bypasses later sandbox / uid‑range checksRouting
params.drv.platform == "wasm32-wasip1"directly toWasiDerivationBuilderhappens before:
- The diverted store check (
storeDir != realStoreDir) and- The
!useSandbox && params.drvOptions.useUidRange(drv)guard.That means for WASI builds:
uid-rangerequests will no longer be rejected when sandboxing is effectively disabled here.- Diverted stores on non‑Linux platforms will also be allowed for this path, unlike other builders.
If this is intentional (relying on the WASI runtime rather than OS sandboxing), consider making that explicit and either:
- Reject
uid-rangeinWasiDerivationBuilder’s path (or in this branch) so configuration errors are surfaced, or- Document that
uid-rangeis ignored forwasm32-wasip1and that OS sandboxing semantics differ from other systems.src/libexpr/primops.cc (1)
168-188: MakingrealisePathnon‑static for reuse is reasonableExposing
SourcePath realisePath(...)(no longerstatic) with the optionalSymlinkResolutionparameter allows other primop modules (e.g. WASM/WASI ones) to share the same path + context resolution logic. The implementation is consistent with existing callers, and the default (Full) preserves previous behavior.Just ensure there is a matching declaration in a header (e.g.
primops.hh) so external uses stay type‑checked and the function remains the single canonical implementation.src/libstore/unix/build/wasi-derivation-builder.cc (1)
5-18: Deduplicateunwrap/string2spanhelpers and tighten const‑correctnessBoth
unwrapandstring2spanappear to be cut‑and‑paste fromsrc/libexpr/primops/wasm.cc, which means future changes to the Wasmtime API or error handling will have to be done in two places.Consider:
- Moving these helpers to a small shared header (e.g. a
wasm-runtime-utils.hh) used by both primops and the WASI builder, and- Switching
string2spanto use astd::span<const uint8_t>(and avoiding the C‑style cast) if the Wasmtime C++ API accepts a const span, to keep const‑correctness and avoid accidental mutation.Please double‑check the current
wasmtime::Module::compilesignature in your pinned Wasmtime version to confirm whetherstd::span<const uint8_t>is acceptable and adjust the helper accordingly.src/libexpr/primops/wasm.cc (6)
6-6: Consider removingusing namespace wasmtime;for clarity.While acceptable in implementation files, explicitly qualifying wasmtime types improves code clarity and prevents potential name collisions with future additions to the nix namespace.
10-15: Refactor: Remove duplicate declaration and include proper header.The
realisePathfunction is already defined insrc/libexpr/primops.cc(lines 167-187). Instead of forward-declaring it here, consider extracting it to a shared header file that both files can include, or making it part of theEvalStateAPI.
94-103: Consider factory method for clearer error handling.The constructor performs heavy operations (WASM compilation via
unwrap()) that can throw. Consider using a static factory method that returnsResult<ref<NixWasmInstancePre>, Error>for clearer error handling and easier testing.
182-196: Simplify error throwing pattern.Line 196 uses an immediately-invoked lambda to throw an error, which is less clear than throwing directly. The ternary chain could also benefit from a switch statement for clarity.
🔎 Proposed refactor
uint32_t get_type(ValueId valueId) { auto & value = *values.at(valueId); state.forceValue(value, noPos); - auto t = value.type(); - return t == nInt ? 1 - : t == nFloat ? 2 - : t == nBool ? 3 - : t == nString ? 4 - : t == nPath ? 5 - : t == nNull ? 6 - : t == nAttrs ? 7 - : t == nList ? 8 - : t == nFunction ? 9 - : []() -> int { throw Error("unsupported type"); }(); + switch (value.type()) { + case nInt: return 1; + case nFloat: return 2; + case nBool: return 3; + case nString: return 4; + case nPath: return 5; + case nNull: return 6; + case nAttrs: return 7; + case nList: return 8; + case nFunction: return 9; + default: throw Error("unsupported type"); + } }
230-238: Consider adding documentation for two-phase copy pattern.This function uses a common but potentially confusing pattern: it only copies data if the buffer is large enough, but always returns the actual size. A brief comment explaining that callers should check
returnValue <= maxLenbefore using the buffer would improve clarity.
426-426: Improve readability by splitting complex expression.Line 426 chains multiple operations in a single expression, making it difficult to read and debug. Consider breaking it into separate steps.
🔎 Proposed refactor
- v = *instance.values.at(instance.runFunction(functionName, {(int32_t) instance.addValue(args[2])}).at(0).i32()); + auto argId = instance.addValue(args[2]); + auto results = instance.runFunction(functionName, {(int32_t) argId}); + auto resultId = results.at(0).i32(); + v = *instance.values.at(resultId);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
packaging/dependencies.nixpackaging/wasmtime.nixsrc/libexpr/meson.buildsrc/libexpr/package.nixsrc/libexpr/primops.ccsrc/libexpr/primops/meson.buildsrc/libexpr/primops/wasm.ccsrc/libstore/derivation-options.ccsrc/libstore/meson.buildsrc/libstore/unix/build/derivation-builder.ccsrc/libstore/unix/build/wasi-derivation-builder.cc
🧰 Additional context used
🧬 Code graph analysis (3)
src/libstore/derivation-options.cc (1)
src/libstore/include/nix/store/derivation-options.hh (2)
drv(184-184)drv(201-201)
src/libstore/unix/build/wasi-derivation-builder.cc (1)
src/libexpr/primops/wasm.cc (7)
unwrap(20-25)unwrap(20-20)v(131-136)v(131-131)v(138-143)string2span(38-41)string2span(38-38)
src/libexpr/primops/wasm.cc (1)
src/libexpr/primops.cc (4)
state(3218-3224)state(3218-3218)realisePath(168-188)realisePath(168-172)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build_aarch64-darwin / build
🔇 Additional comments (6)
src/libexpr/primops/meson.build (1)
12-12: LGTM!Adding wasm.cc to the primops sources is straightforward and correct.
src/libexpr/package.nix (1)
17-17: LGTM!The wasmtime dependency is correctly added following the same pattern as toml11 and other build-time dependencies.
Also applies to: 68-68
packaging/dependencies.nix (1)
78-79: Wiringwasmtimeinto the dependency scope looks correct
wasmtime = pkgs.callPackage ./wasmtime.nix { };is consistent with how other custom packages are exposed here; no further changes needed.src/libstore/unix/build/derivation-builder.cc (2)
294-298: Naming comment onrealPathInSandboxThe
// FIXME: misnomernote correctly reflects that this helper currently just resolves store paths rather than honoring a sandbox path mapping. No functional impact, so this can remain as a low‑priority cleanup marker.
1996-2001: Includingwasi-derivation-builder.ccalongside other platform builders is consistentAdding
#include "wasi-derivation-builder.cc"next to the other platform-specific builder includes matches the existing “single TU with included .cc files” pattern. No issues from an integration standpoint.packaging/wasmtime.nix (1)
1-121: Wasmtime packaging looks solid; minor note on header-copy path robustnessOverall this derivation is well-structured:
- Uses a pinned Rust toolchain (
rust_1_89) andbuildRustPackage.- Restricts
doCheckto platforms with the required SIMD features and skips problematicaarch64runs.- Splits outputs into
out,dev, andlib, pruning.so/.aaccording toenableShared/enableStatic.- Handles shell completions correctly for both native and cross builds.
- Adds an install-time
--versioncheck and an updateScript with an appropriate version regex.One thing to keep in mind is the header copy:
cp -r target/${cargoShortTarget}/release/build/wasmtime-c-api-impl-*/out/include $dev/includeThis relies on the exact crate/build directory naming and
releaseprofile path; if upstream layout or build profile changes, this will break somewhat opaquely. It may be worth:
- Asserting that exactly one
wasmtime-c-api-impl-*match exists, or- Failing with a clearer message if the glob expands to zero or multiple directories.
Otherwise the packaging is in good shape.
Please double-check against your pinned Wasmtime 40.0.0 sources that the
wasmtime-c-api-impl-*build output layout (and thecargoShortTarget/releasepath) is stable; if upstream changes it, thecp -r .../out/includestep will be the first thing to break.
| static std::span<T> subspan(std::span<uint8_t> s, size_t len) | ||
| { | ||
| assert(s.size() >= len * sizeof(T)); | ||
| return std::span((T *) s.data(), len); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Replace assert with proper error handling.
The assert on line 51 is compiled out in release builds, which could lead to buffer overruns if the size check fails. Use a runtime check that throws an exception instead.
🔎 Proposed fix
template<typename T>
static std::span<T> subspan(std::span<uint8_t> s, size_t len)
{
- assert(s.size() >= len * sizeof(T));
+ if (s.size() < len * sizeof(T))
+ throw Error("buffer too small: expected at least %d bytes, got %d", len * sizeof(T), s.size());
return std::span((T *) s.data(), len);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static std::span<T> subspan(std::span<uint8_t> s, size_t len) | |
| { | |
| assert(s.size() >= len * sizeof(T)); | |
| return std::span((T *) s.data(), len); | |
| } | |
| template<typename T> | |
| static std::span<T> subspan(std::span<uint8_t> s, size_t len) | |
| { | |
| if (s.size() < len * sizeof(T)) | |
| throw Error("buffer too small: expected at least %d bytes, got %d", len * sizeof(T), s.size()); | |
| return std::span((T *) s.data(), len); | |
| } |
🤖 Prompt for AI Agents
In @src/libexpr/primops/wasm.cc around lines 49-53, The subspan function
currently uses assert to check buffer size which is omitted in release builds;
replace the assert in subspan(std::span<uint8_t> s, size_t len) with a runtime
check that throws a descriptive exception (e.g., std::out_of_range or
std::length_error) when s.size() < len * sizeof(T), include the required header
( <stdexcept> ), and keep the returned span construction (use
reinterpret_cast<T*>(s.data()) for clarity) so callers get a safe failure
instead of potential buffer overruns.
| ValueId make_list(uint32_t ptr, uint32_t len) | ||
| { | ||
| auto vs = subspan<ValueId>(memory().subspan(ptr), len); | ||
|
|
||
| auto [valueId, value] = allocValue(); | ||
|
|
||
| auto list = state.buildList(len); | ||
| for (const auto & [n, v] : enumerate(list)) | ||
| v = values.at(vs[n]); // FIXME: endianness | ||
| value.mkList(list); | ||
|
|
||
| return valueId; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address endianness assumption for cross-platform correctness.
The FIXME on line 263 notes that ValueIds are read from WASM memory without endianness conversion. WASM uses little-endian, so this code will fail on big-endian host systems. While big-endian systems are rare, this violates Nix's goal of deterministic, cross-platform builds.
The same issue appears in copy_list (line 278), make_attrset (line 296), and copy_attrset (line 322).
Consider using explicit little-endian conversion functions (e.g., le32toh) when reading multi-byte values from WASM memory.
🤖 Prompt for AI Agents
In @src/libexpr/primops/wasm.cc around lines 255-267, The code reads ValueId
uint32_t values directly from WASM memory in make_list, copy_list, make_attrset,
and copy_attrset, assuming host endianness; fix this by converting each 32-bit
word from little-endian to host order when constructing ValueId (e.g., apply
le32toh or an equivalent portable little-endian-to-host conversion) for every
read from the memory subspan (the vs[] accesses) so ValueId values are correct
on big-endian hosts; update all occurrences in the listed functions to perform
this conversion before indexing into values or assigning into lists/attrsets.
| uint32_t copy_attrset(ValueId valueId, uint32_t ptr, uint32_t maxLen) | ||
| { | ||
| auto & value = *values.at(valueId); | ||
| state.forceAttrs(value, noPos, "while copying an attrset into WASM"); | ||
|
|
||
| if (value.attrs()->size() <= maxLen) { | ||
| // FIXME: endianness. | ||
| struct Attr | ||
| { | ||
| ValueId value; | ||
| uint32_t nameLen; | ||
| }; | ||
|
|
||
| auto buf = subspan<Attr>(memory().subspan(ptr), maxLen); | ||
|
|
||
| // FIXME: for determinism, we should return attributes in lexicographically sorted order. | ||
| for (const auto & [n, attr] : enumerate(*value.attrs())) { | ||
| buf[n].value = addValue(attr.value); | ||
| buf[n].nameLen = state.symbols[attr.name].size(); | ||
| } | ||
| } | ||
|
|
||
| return value.attrs()->size(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix: Ensure deterministic attribute ordering.
The FIXME on line 324 correctly identifies that attributes should be returned in lexicographically sorted order for determinism. Nix's evaluation model requires consistent attribute ordering across evaluations.
🔎 Proposed fix
if (value.attrs()->size() <= maxLen) {
// FIXME: endianness.
struct Attr
{
ValueId value;
uint32_t nameLen;
};
auto buf = subspan<Attr>(memory().subspan(ptr), maxLen);
- // FIXME: for determinism, we should return attributes in lexicographically sorted order.
- for (const auto & [n, attr] : enumerate(*value.attrs())) {
+ // Collect and sort attributes by name for determinism
+ std::vector<std::pair<std::string_view, const Attr*>> sortedAttrs;
+ for (const auto & attr : *value.attrs()) {
+ sortedAttrs.push_back({state.symbols[attr.name], &attr});
+ }
+ std::sort(sortedAttrs.begin(), sortedAttrs.end(),
+ [](const auto & a, const auto & b) { return a.first < b.first; });
+
+ for (const auto & [n, pair] : enumerate(sortedAttrs)) {
+ const auto & attr = *pair.second;
buf[n].value = addValue(attr.value);
buf[n].nameLen = state.symbols[attr.name].size();
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In @src/libexpr/primops/wasm.cc around lines 309-332, The copy_attrset function
writes attributes into WASM memory using enumerate(*value.attrs()) which is
non-deterministic; to fix, gather the attributes into a temporary vector (e.g.,
vector of pairs or pointers to attr), sort that vector lexicographically by the
attribute name string via state.symbols[attr.name], then iterate the sorted
vector to populate buf[n].value = addValue(attr.value) and buf[n].nameLen =
state.symbols[attr.name].size(), preserving the existing buffer handling
(subspan/Attr/maxLen) and return value.attrs()->size().
| std::monostate copy_attrname(ValueId valueId, uint32_t attrIdx, uint32_t ptr, uint32_t len) | ||
| { | ||
| auto & value = *values.at(valueId); | ||
| state.forceAttrs(value, noPos, "while copying an attr name into WASM"); | ||
|
|
||
| auto & attrs = *value.attrs(); | ||
|
|
||
| assert((size_t) attrIdx < attrs.size()); | ||
|
|
||
| std::string_view name = state.symbols[attrs[attrIdx].name]; | ||
|
|
||
| assert((size_t) len == name.size()); | ||
|
|
||
| memcpy(memory().subspan(ptr, len).data(), name.data(), name.size()); | ||
|
|
||
| return {}; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Replace assertions with runtime checks.
Lines 341 and 345 use assert statements that are compiled out in release builds. This can lead to:
- Line 341: Out-of-bounds array access if
attrIdx >= attrs.size() - Line 345: Buffer overflow if
len != name.size()
🔎 Proposed fix
std::monostate copy_attrname(ValueId valueId, uint32_t attrIdx, uint32_t ptr, uint32_t len)
{
auto & value = *values.at(valueId);
state.forceAttrs(value, noPos, "while copying an attr name into WASM");
auto & attrs = *value.attrs();
- assert((size_t) attrIdx < attrs.size());
+ if ((size_t) attrIdx >= attrs.size())
+ throw Error("attribute index %d out of bounds (size: %d)", attrIdx, attrs.size());
std::string_view name = state.symbols[attrs[attrIdx].name];
- assert((size_t) len == name.size());
+ if ((size_t) len != name.size())
+ throw Error("buffer size mismatch: expected %d bytes, got %d", name.size(), len);
memcpy(memory().subspan(ptr, len).data(), name.data(), name.size());
return {};
}🤖 Prompt for AI Agents
In @src/libexpr/primops/wasm.cc around lines 334-350, In copy_attrname, replace
the compile-time asserts with runtime validations: check that attrIdx <
attrs.size() and that len == name.size(); if either check fails, raise a runtime
error (e.g., throw std::runtime_error with a clear message mentioning valueId,
attrIdx, len and name.size()) or otherwise signal a trap/error path instead of
proceeding; only perform memcpy when both checks pass and keep the rest of the
logic (state.forceAttrs, obtaining name from state.symbols, and writing into
memory()) unchanged.
| // FIXME: make thread-safe. | ||
| // FIXME: make this a weak Boehm GC pointer so that it can be freed during GC. | ||
| static std::unordered_map<SourcePath, ref<NixWasmInstancePre>> instancesPre; | ||
|
|
||
| auto instancePre = instancesPre.find(wasmPath); | ||
| if (instancePre == instancesPre.end()) | ||
| instancePre = instancesPre.emplace(wasmPath, make_ref<NixWasmInstancePre>(wasmPath)).first; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if EvalState or prim_* functions are used in concurrent contexts
rg -n "std::thread|std::async|pthread" --type=cpp -C 2 | head -50
# Check for existing thread-safety patterns in primops
rg -n "std::mutex|std::lock_guard|std::atomic" src/libexpr/ --type=cpp -C 2 | head -50Repository: DeterminateSystems/nix-src
Length of output: 5273
🏁 Script executed:
#!/bin/bash
# Look at the actual file context around lines 411-418
cat -n src/libexpr/primops/wasm.cc | sed -n '400,430p'Repository: DeterminateSystems/nix-src
Length of output: 1640
🏁 Script executed:
#!/bin/bash
# Check how EvalState handles concurrent evaluation
rg -n "EvalState.*thread|thread.*EvalState" --type=cpp -C 3 | head -40
# Look for parallel evaluation support
rg -n "parallel|concurrent|thread-safe" src/libexpr/ --type=cpp | grep -i "eval\|prim" | head -20Repository: DeterminateSystems/nix-src
Length of output: 2211
Fix thread-unsafe static cache in prim_wasm.
The static instancesPre map on line 413 is not thread-safe and will race under parallel evaluation. Nix supports parallel evaluation (see builtins.parallel and parallel-eval.cc), making this a critical issue.
The cache grows unbounded and is never freed. Use boost::concurrent_flat_map (already used elsewhere in EvalState for srcToStore and importResolutionCache) and consider weak pointers or LRU eviction to prevent memory bloat.
🤖 Prompt for AI Agents
In @src/libexpr/primops/wasm.cc around lines 411-418, The static unordered_map
instancesPre is not thread-safe and leaks entries; replace it with a thread-safe
concurrent map (use boost::concurrent_flat_map keyed by SourcePath) and store
weak references instead of owning refs so entries can be reclaimed (e.g., map
value -> weak pointer to NixWasmInstancePre or equivalent weak_ref). Update the
lookup/insert logic around instancesPre and the use of
make_ref<NixWasmInstancePre>(wasmPath) so you first attempt to lock the weak
entry, create a new instance only if expired, and store a weak reference in the
concurrent_flat_map; optionally add an eviction or LRU policy if unbounded
growth remains a concern.
| static RegisterPrimOp primop_fromTOML( | ||
| {.name = "wasm", | ||
| .args = {"wasm", "entry", "arg"}, | ||
| .doc = R"( | ||
| Call a WASM function with the specified argument. | ||
| )", | ||
| .fun = prim_wasm}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix misleading variable name.
The variable is named primop_fromTOML but actually registers the wasm primop. This appears to be a copy-paste error from another primop definition.
🔎 Proposed fix
-static RegisterPrimOp primop_fromTOML(
+static RegisterPrimOp primop_wasm(
{.name = "wasm",
.args = {"wasm", "entry", "arg"},
- .doc = R"(
- Call a WASM function with the specified argument.
- )",
+ .doc = R"(
+ Call a WASM function from a WebAssembly module.
+
+ Arguments:
+ - wasm: Path to the .wasm file
+ - entry: Name of the exported function to call
+ - arg: Argument to pass to the function (Nix value)
+
+ Returns the result from the WASM function as a Nix value.
+ The WASM module must export a `nix_wasm_init_v1` function.
+ )",
.fun = prim_wasm});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static RegisterPrimOp primop_fromTOML( | |
| {.name = "wasm", | |
| .args = {"wasm", "entry", "arg"}, | |
| .doc = R"( | |
| Call a WASM function with the specified argument. | |
| )", | |
| .fun = prim_wasm}); | |
| static RegisterPrimOp primop_wasm( | |
| {.name = "wasm", | |
| .args = {"wasm", "entry", "arg"}, | |
| .doc = R"( | |
| Call a WASM function from a WebAssembly module. | |
| Arguments: | |
| - wasm: Path to the .wasm file | |
| - entry: Name of the exported function to call | |
| - arg: Argument to pass to the function (Nix value) | |
| Returns the result from the WASM function as a Nix value. | |
| The WASM module must export a `nix_wasm_init_v1` function. | |
| )", | |
| .fun = prim_wasm}); |
🤖 Prompt for AI Agents
In @src/libexpr/primops/wasm.cc around lines 433-439, The static variable name
`primop_fromTOML` is misleading for the WASM primop registration; rename the
variable to a descriptive identifier like `primop_wasm` (i.e., change `static
RegisterPrimOp primop_fromTOML(...` to `static RegisterPrimOp primop_wasm(...`)
so it matches the registered `.name = "wasm"` and `.fun = prim_wasm`; update any
local references to this static variable if present in the file to use the new
name.
| void execBuilder(const Strings & args, const Strings & envStrs) override | ||
| { | ||
| using namespace wasmtime; | ||
|
|
||
| Engine engine; | ||
| Linker linker(engine); | ||
| unwrap(linker.define_wasi()); | ||
|
|
||
| WasiConfig wasiConfig; | ||
| wasiConfig.inherit_stdin(); | ||
| wasiConfig.inherit_stdout(); | ||
| wasiConfig.inherit_stderr(); | ||
| wasiConfig.argv(std::vector(args.begin(), args.end())); | ||
| { | ||
| std::vector<std::pair<std::string, std::string>> env2; | ||
| for (auto & [k, v] : env) | ||
| env2.emplace_back(k, rewriteStrings(v, inputRewrites)); | ||
| wasiConfig.env(env2); | ||
| } | ||
| if (!wasiConfig.preopen_dir( | ||
| store.config->realStoreDir.get(), | ||
| store.storeDir, | ||
| WASMTIME_WASI_DIR_PERMS_READ | WASMTIME_WASI_DIR_PERMS_WRITE, | ||
| WASMTIME_WASI_FILE_PERMS_READ | WASMTIME_WASI_FILE_PERMS_WRITE)) | ||
| throw Error("cannot add store directory to WASI config"); | ||
| // FIXME: add temp dir | ||
|
|
||
| auto module = unwrap(Module::compile(engine, string2span(readFile(realPathInSandbox(drv.builder))))); | ||
| wasmtime::Store wasmStore(engine); | ||
| unwrap(wasmStore.context().set_wasi(std::move(wasiConfig))); | ||
| auto instance = unwrap(linker.instantiate(wasmStore, module)); | ||
|
|
||
| auto startName = "_start"; | ||
| auto ext = instance.get(wasmStore, startName); | ||
| if (!ext) | ||
| throw Error("WASM module '%s' does not export function '%s'", drv.builder, startName); | ||
| auto fun = std::get_if<Func>(&*ext); | ||
| if (!fun) | ||
| throw Error("export '%s' of WASM module '%s' is not a function", startName, drv.builder); | ||
|
|
||
| unwrap(fun->call(wasmStore.context(), {})); | ||
|
|
||
| _exit(0); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd src/libstore/unix/build && wc -l wasi-derivation-builder.ccRepository: DeterminateSystems/nix-src
Length of output: 99
🏁 Script executed:
cd src/libstore/unix/build && head -80 wasi-derivation-builder.ccRepository: DeterminateSystems/nix-src
Length of output: 2592
🏁 Script executed:
rg -n "class.*DerivationBuilder" src/libstore/unix/build/ -A 5Repository: DeterminateSystems/nix-src
Length of output: 510
🏁 Script executed:
rg -n "initEnv" src/libstore/unix/build/ -A 10Repository: DeterminateSystems/nix-src
Length of output: 2704
🏁 Script executed:
rg -n "runChild" src/libstore/unix/build/build-user.cc -A 15Repository: DeterminateSystems/nix-src
Length of output: 147
🏁 Script executed:
rg -n "runChild" src/libstore/ -B 2 -A 15Repository: DeterminateSystems/nix-src
Length of output: 9282
🏁 Script executed:
sed -n '1057,1150p' src/libstore/unix/build/derivation-builder.ccRepository: DeterminateSystems/nix-src
Length of output: 4230
🏁 Script executed:
rg -n "void.*execBuilder" src/libstore/ -B 5 -A 5Repository: DeterminateSystems/nix-src
Length of output: 3761
🏁 Script executed:
sed -n '1276,1360p' src/libstore/unix/build/derivation-builder.ccRepository: DeterminateSystems/nix-src
Length of output: 2796
🏁 Script executed:
rg -n "tmpDirInSandbox" src/libstore/unix/build/ -A 3Repository: DeterminateSystems/nix-src
Length of output: 4675
🏁 Script executed:
sed -n '1340,1390p' src/libstore/unix/build/derivation-builder.ccRepository: DeterminateSystems/nix-src
Length of output: 2017
🏁 Script executed:
rg -n "for.*env\|envStrs" src/libstore/unix/build/derivation-builder.cc -A 3 -B 2 | grep -A 5 "1340\|1350\|1360\|1370"Repository: DeterminateSystems/nix-src
Length of output: 52
🏁 Script executed:
sed -n '1350,1380p' src/libstore/unix/build/derivation-builder.ccRepository: DeterminateSystems/nix-src
Length of output: 1016
Address unused envStrs parameter and duplicate env rewriting
The envStrs parameter is computed in runChild() by iterating over env and applying rewriteStrings() to each key=value pair, then passed to execBuilder(). The WASI override ignores this and rebuilds env2 using identical logic. Reuse envStrs by parsing it to avoid duplication and ensure consistency if env handling changes.
Preopen the temp directory and adjust store permissions
initEnv() sets TMPDIR, NIX_BUILD_TOP, and PWD to tmpDirInSandbox(), but only the store directory is preopened in the WASI config. WASM builders attempting to write to $TMPDIR or access $NIX_BUILD_TOP will fail. Additionally, the store is preopened with READ|WRITE permissions on all files; typical chroot builds expose stores read-only, with only outputs writable. Add a preopen for the temp directory and consider making the store read-only while exposing outputs or scratch locations separately to match existing sandbox assumptions.
This function was named incorrectly. It denotes the path in the host filesystem, not in the sandbox.
Even though it's in a separate output, disabling it reduces the size of libwasmtime.so by around 12 MB.
This reduces the size of the Nix closure by another 12 MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a copyright notice that points back to nixpkgs, since this is a verbatim copy of the package definition from there.
Motivation
This PR adds support for extending Nix using WASM (specifically
wasmtime) in two ways:It adds a new builtin function,
builtins.wasm, that allows Nix functions to be written in any language that compiles to WASM. For example,The goal is to allow complex functionality (like YAML parsers) to be written in more convenient languages (like Rust) and executed faster, without having to resort to import-from-derivation.
The WASM function can lazily traverse its argument and can produce arbitrary Nix values (except functions). Each WASM function call is executed as a fresh instance of the WASM VM, ensuring determinism (i.e. you can't use the WASM memory to pass state between function calls).
See Initial bindings/examples for writing Nix functions in Rust nix-wasm-rust#1 for examples of Nix functions written in Rust.
It adds a new system type,
wasm32-wasip1, that allows for system-independent derivations. For example:Example here: https://github.com/DeterminateSystems/nix-wasi-rust
Context
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.